Introduction

For the making of this report, it was given a dataset with data of pacients of the Hospital Austral who had suffered from heart diseases and where treated there with a surgery or an angioplasty. So, the goal of this report is trying to predict the neccesary intervention a future pacient may need.

Analysis

In this section, we are going to analyze the different factors and variables that may be used to take into account in the predictor.

Percentage of Men and Women and age distribution for each one

Given the dataset, we thought it was imperative to start analyzing the percentage of men and women present in the registry. Even though, the difference of sex do not tend to be a factor primarily to take into account in the predictor, it could be good to show the difference and see whether we have to use it.

On the other hand, it could be relevant to check and analyze the difference of age between men and women. With this density graph, it is easy to contrast the age and each sex.

Age for COPD, Obesity, Diabetes and Dialysis

We found very useful to see the amount of pacients with risks factors regarding the age. Even though, age usually do not tend to be fundamental to leading to cardiac issues, unfortuantely it rises the probabability of getting other factors, such as COPD, obesity, daibetes or dialysis.

Percentage of people who go for angioplasty, surgery or endovalvula

This graph contains relevant information, to start analyizing the dataset with what we are going to predict: which proccedure a pacient with cardaic problems must go to.

With this easy-reading pie chart, we can see how more often angioplaties are. However, surgeries are also very common as we can see.

Percentage of reasons of admission per procedure

With these two graphs we see how important is the reason of admission and see how the majority went with a programmed date.

Interpretation

We interpretate the dataser with graphs using multiple crossed variables.

Number of pacients with factors of risk

Using two venn diagrams, we can easily check patiens with multiple risk factors, for example, six male patients suffer from obesity and diabetes. These type of graphs are very useful to see the preccedure chosen for each pacient and how multiple factors may affect the decision.

Male

Female

Number of pacients going to each procedure by age range

Here is where things start to get interesting. We check how important is age as a variable. How much will we take it into account once we make the predictor.

Pacients with each factor going to each procedure

Just like the previous graph, we see how important a variable is. In this case, the risk factors. We can see how the majority of pacients with diabetes go to angioplasty. However, the majority of obese pacient go to surgery. This is important information to take into account for the predictor

Predictor

Once we analyze and interpretate the given dataset,we selected the variables the we are going to use for the model. And these are: Age, risk factor and number of injuries.Then, we use 70% of the data to train our model, and then with the remaining 30% we use it to test how effective actually is.

So, with the test partition, we show the results with a confussion matrix, where we contrast the predicted values and the actual values.
Angioplasty Surgery
Angioplasty 68 2
Surgery 7 27

Finally, with an AUROC we compare the true positive rate and the false positive rate. Here, we can confirm there is a low ratio of false positives.

Team

Ignacio Fernandez Battolla

Ignacio Chalub

Segundo Marcaida

Mateo Valle

Federico Pochat